I created an azure function to download daily a search I did on seek.com.au using the python package job-seeker.
You can find more about job-seeker here
Just to give a bit of context.
I do a daily search using job-seeker with the term "data analyst" and only search jobs in "Adelaide SA" where I live.
Did a bit of data analysis and look what I have found.
# import packages
import json
from datetime import datetime
import pytz
from collections import defaultdict
import pandas as pd
import numpy as np
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
from azure.storage.blob import BlobServiceClient
import hvplot.pandas
import holoviews as hv
hv.extension('bokeh')
from bokeh.models.formatters import NumeralTickFormatter
# setup local credentials
default_credential = DefaultAzureCredential()
secret_client = SecretClient(
vault_url="https://junqueira-42.vault.azure.net/",
credential= default_credential
)
storage_credentials = secret_client.get_secret(name="storage-key")
# start a container client to query latest blog
service = BlobServiceClient(account_url="https://storage4223.blob.core.windows.net", credential=storage_credentials.value)
# Instantiate a ContainerClient
container_client = service.get_container_client("jobsdatadetails")
blob_list = container_client.list_blobs()
blobs = [i.name for i in blob_list]
latest_download = blobs[-1]
latest_download
'data_analyst_jobs_detail_on_2021-10-17T06:30:00.488021+10:30.json'
# download data
downloaded = container_client.download_blob(blob=latest_download)
text = downloaded.content_as_text()
data = json.loads(text)
# print example of jobs
example = data[1]
example
{'id': 54310577,
'listingDate': '2021-10-12T00:01:37.000Z',
'expiryDate': '2021-11-11T12:59:59.000Z',
'title': 'Microsoft BI / Analytics Consultant',
'advertiser': {'id': 2852,
'description': 'DWS Limited',
'searchParams': {'keywords': 'DWS'}},
'locationHierarchy': {'nation': 'Australia',
'state': 'South Australia',
'city': 'Adelaide',
'area': '',
'suburb': ''},
'locationId': 1007,
'stateId': 3104,
'workType': 'Full time',
'classification': {'id': 6281,
'description': 'Information & Communication Technology'},
'subClassification': {'id': 6287, 'description': 'Developers/Programmers'},
'salary': 'up to $140K pkg (inc super) dep. on exp.',
'salaryType': 'AnnualPackage',
'isLinkOut': True,
'isScreenAssigned': False,
'isSelectionCriteriaEnabled': False,
'status': 'Active',
'isRightToWorkRequired': True,
'hasRoleRequirements': False,
'roleRequirements': [],
'video': {'link': 'https://www.youtube.com/embed/hMCiZSDc7_s?rel=0',
'position': 'Below'},
'jobAdDetails': '** Due to security clearance requirements<strong> Australian Citizenship is essential for this role</strong>. Existing NV1 or baseline clearance is beneficial, but not essential **<br /><br /><br />As an expert with<strong> Microsoft BI (MSBI) SSIS / SSRS / SSAS</strong>, <strong>Power BI</strong>, and<strong>\xa0SQL Server </strong>and\xa0significant professional work experience in<strong> Business Intelligence (BI) and Analytics</strong><strong>\xa0-\xa0</strong>you understand the value of data to help solve business problems.\xa0<br />You will also have a passion for\xa0<strong>data visualisation</strong>\xa0whether its developing models, prototypes, reports or dashboards.<br /><br /> You’re happy to roll up your sleeves and see projects through from end-to-end including mapping business processes, conduct requirements analysis and documentation. Most importantly you understand that your role is all about <strong>helping our clients source, transform and interpret\xa0data into actionable metrics</strong>\xa0to optimise their business.<br /> <br />Our team of broadly experienced Consultants, work on a diverse portfolio of client, projects, and innovation initiatives - some of which will make you the envy of your peers. <br /> \xa0\xa0<br /><br /><strong>** Applicants must have Australian Citizenship due to security requirements **</strong><br /><br /><br /> <em>Required skills and experience:</em><br /> <ul> <li><strong>Previous experience\xa0</strong>in a similar role\xa0as a<strong> Microsoft\xa0BI / Analytics\xa0Specialist</strong></li> <li><strong>Significant, commercial, hands-on experience </strong>across the<strong> Microsoft BI stack (SSIS,\xa0SSRS,\xa0SSAS)</strong></li> <li>An\xa0<strong>expert in Power BI</strong> and<strong> SQL Server</strong></li> <li>You will likely also have experience with or knowledge of<strong> Azure</strong> data tools;<strong> Databricks</strong>,<strong> Data Factory</strong>,<strong> Data Lake</strong> etc.</li> <li>Experience working in a\xa0<strong>complex enterprise data warehouse\xa0</strong>environment</li> <li>Solid<strong> Agile</strong> environment experience</li> <li>Knowledge and experience of <strong>DWH</strong>,<strong> BI</strong>, and<strong> analytics\xa0best practice</strong></li> <li>Ability to <strong>develop requirements into actionable business\xa0analytics</strong> strategies, roadmaps, and BI solutions</li> <li>Able to quickly<strong> build strong relationships with decision-makers</strong>, and become their trusted advisor</li> <li>Knowledge of <strong>Python\xa0</strong>and\xa0<strong>R </strong>would be beneficial, but not essential</li> </ul> <strong>\xa0</strong>\xa0<br /><strong>About Us:</strong>\xa0<br /> Established in 1991, DWS is a professional Technology Consultancy and part of HCL Technologies. With offices across Australia, we have over 800 employees in Melbourne, Sydney, Adelaide, Brisbane and Canberra; providing services to a broad range of blue-chip corporate clients, as well as Federal, State and Local Government agencies. We provide a wide range of services including; IT consulting, managed application services, project management, customer driven innovation, digital transformation, data and business analytics, strategic advisory and productivity, and robotic process automation.<br /> <br /> <br /> <strong>Working at DWS offers you:</strong><br /> <ul> <li>Ongoing professional and career development</li> <li>Paid learning and certifications (chosen entirely by you), with completion bonuses</li> <li>An active social club that host regular, subsidised events</li> <li>Corporate discounts on a range of services and products, including health insurance, gym memberships, Apple products and much more\xa0</li> </ul> \xa0\xa0<br /><strong>** Applicants must have Australian Citizenship due to security requirements **<br /><br /></strong><br />Ready for a new challenge? Click\xa0<strong>APPLY now!</strong><br /> www.dws.com.au',
'companyReview': {'companyOverallRating': 2.8,
'companyTotalReviews': 72,
'companyProfileUrl': '/companies/dws-433963/reviews',
'companyName': 'DWS',
'companyId': 433963},
'contactMatches': [],
'branding': {'assets': {'cover': {'strategies': {'jdp-cover-mobile-small': 'https://image-service-cdn.seek.com.au/1efb87af8a2947c279b8d0757d278b8897562c11/5221220d3404cc46e5a66dd2a397fdb873cc9fb1',
'jdp-cover': 'https://image-service-cdn.seek.com.au/1efb87af8a2947c279b8d0757d278b8897562c11/0496d19fcee4c76ed2e0808dabea0a3d4985af52',
'jdpCover': 'https://image-service-cdn.seek.com.au/f23adcba2ddc8e4cfd0669d8e5ab9b1607a08832/de415285a7b2c9b5ef3fca83f12b73bfb39546c3',
'jdpCoverThumbnail': 'https://image-service-cdn.seek.com.au/f23adcba2ddc8e4cfd0669d8e5ab9b1607a08832/11d49ff9c5402e27f0a02013bc732148ba0fbc11'},
'url': 'https://image-service-cdn.seek.com.au/f23adcba2ddc8e4cfd0669d8e5ab9b1607a08832',
'id': 'f23adcba2ddc8e4cfd0669d8e5ab9b1607a08832'},
'logo': {'strategies': {'serpLogo': 'https://image-service-cdn.seek.com.au/2193cd5731de212ae9c764e394cec61a08355b24/dffceafa8da80ae865f1fa835a87a74a26be6546',
'jdp-logo': 'https://image-service-cdn.seek.com.au/2193cd5731de212ae9c764e394cec61a08355b24/e080dbc8bb6734b54ec25b26f8a671c27122cfd8',
'serp-logo': 'https://image-service-cdn.seek.com.au/2193cd5731de212ae9c764e394cec61a08355b24/5be3aa712f8d81183274066e1385941133e41a1b',
'jdpLogo': 'https://image-service-cdn.seek.com.au/2193cd5731de212ae9c764e394cec61a08355b24/f3c5292cec0e05e4272d9bf9146f390d366481d0'},
'url': 'https://image-service-cdn.seek.com.au/2193cd5731de212ae9c764e394cec61a08355b24',
'id': '2193cd5731de212ae9c764e394cec61a08355b24'}},
'updatedOn': '2021-02-04T04:41:44.805Z',
'legacyId': 296010,
'createdOn': '2019-02-01T05:21:20.542Z',
'isDefault': True,
'id': '761d01f9-1634-671a-19ca-a8a6a63af521',
'name': 'DWS (default)',
'ownerId': 'seekId:2852',
'state': 'ACTIVE',
'owner': {'type': 'seekId', 'id': '2852'}},
'roleTitles': 'business-intelligence-analytics-consultant,analytics-consultant,consultant',
'isPrivateAdvertiser': False}
# object keys
keys = ['id', 'listingDate', 'expiryDate', 'title',
'advertiser', 'locationHierarchy', 'locationId',
'stateId', 'workType', 'classification', 'subClassification',
'salary', 'salaryType', 'isLinkOut', 'isScreenAssigned',
'isSelectionCriteriaEnabled', 'status', 'isRightToWorkRequired',
'hasRoleRequirements', 'roleRequirements', 'video', 'jobAdDetails',
'companyReview', 'contactMatches', 'branding', 'roleTitles', 'isPrivateAdvertiser']
# stract data from object and put in a data structure to analyse with pandas
data_df = defaultdict(list)
for job in data:
data_df["id"].append(job.get("id"))
data_df["listingDate"].append(job.get("listingDate"))
data_df["expiryDate"].append(job.get("expiryDate"))
data_df["title"].append(job.get("title"))
data_df["advertiser"].append(job["advertiser"]["description"])
data_df["salary"].append(job.get("salary"))
data_df["salaryType"].append(job.get("salaryType"))
data_df["roleTitles"].append(job.get("roleTitles"))
if job.get('companyReview'):
data_df["rating"].append(job['companyReview']['companyOverallRating'])
data_df["totalReviews"].append(job['companyReview']['companyTotalReviews'])
if not job.get('companyReview'):
data_df["rating"].append(None)
data_df["totalReviews"].append(None)
df = pd.DataFrame(data_df)
# shape of the data
df.shape
(624, 10)
# check how many of the jobs are really data science
# create a column to search if the role has key words that I consider a true data job
key_words = ["data-analyst","data-engineer",
"data-scientist", "business-intelligence",
"tableau-consultant"
"database-developer",
"analytics-developer",
"database-administrator",
"database-programmer","scientist",
"integration-analyst",
]
def find_real_data_jobs(title, *key_words):
return any(w in title.lower() for w in key_words)
df["data_job"] = (df["roleTitles"].apply(lambda x: x if x else "" )
.apply(find_real_data_jobs, args=(key_words))
)
create a series of function to clean text of information about salary that we can analyise for example the text "up to $140K pkg (inc super) dep. on exp." to become -> 140,000
def useful_salary_data(text):
return text if text and "$" in text else ""
# exclude salary ads that are per hours and day
garbage = ["per hour", "p.d.", "p.h.", "per day"]
def remove_garbage(text, *garbage):
return text if not any(g in text for g in garbage) else ""
def get_dolars(sentence):
return " ".join([word for word in sentence.split() if "$" in word])
# extract from text the average salary range
def average_salary(text_salary):
n_salaries = len(text_salary.split())
salaries = [salary.lower().strip(")").strip("$") for salary in text_salary.split()]
in_Ks = "k" in text_salary.lower()
salaries = [salary.lower().strip("k").strip("au$").strip("'s").replace(",","").replace("($","").replace(")","") for salary in text_salary.split() if "/h" not in text_salary and "-$" not in text_salary and "-" not in text_salary ]
salaries = [float(salary) if salary else 0 for salary in salaries]
if salaries:
average = sum(salaries)/n_salaries
if not salaries:
average = 0
average = average if not in_Ks else average*1000
return average
df["average_salary"] = (df["salary"].apply(useful_salary_data)
.apply(remove_garbage, args=(garbage))
.apply(get_dolars)
.apply(average_salary)
)
# create a job classification to analyse different type of jobs
def data_job_class(title):
other_data = [
"business-intelligence",
"tableau-consultant"
"database-developer",
"analytics-developer",
"database-administrator",
"database-programmer","scientist",
"integration-analyst"
]
if title:
if "data-scientist" in title:
return "data-scientist"
if "data-engineer" in title:
return "data-engineer"
if "data-analyst" in title:
return "data-analyst"
if any(o in title for o in other_data):
return "other-data"
else:
return "non-data"
# data classification labels column
df["role_titles_data"] = (df["roleTitles"].apply(lambda x: x if x else "" )
.apply(data_job_class)
)
# extract only records with salaty data
df_with_salaries = df[df["average_salary"] > 0]
print(f"only {df_with_salaries.shape[0]} jobs has salary data")
only 129 jobs has salary data
- Only 47 jobs are realy data jobs, which is about ~7% of the total of the seek search
df["data_job"].value_counts().hvplot.bar()
- Data jobs are on average better paid that "other" jobs
formatter = NumeralTickFormatter(format="$0,0")
df_with_salaries.hvplot.kde(y='average_salary', xformatter=formatter, by="data_job")
- Only 5 Data Scientist job and 5 Data Engineer jobs all rest were somewhat related to data but not the "rare skill" category.
df[df["role_titles_data"]!="non-data"]["role_titles_data"].value_counts().hvplot.bar()
- Data Scientist is really well paid.
- No Data Engineer ad had salary to compare
df_with_salaries.hvplot.kde(y='average_salary', xformatter=formatter, by="role_titles_data")
australia = pytz.timezone("Australia/Sydney")
def ad_duration(row):
return (pd.to_datetime(row["expiryDate"]) - pd.to_datetime(row["listingDate"])).days
df["ad_duration"] = df.apply(ad_duration, axis=1)
df["ad_age"] = (australia.localize(datetime.now()) - pd.to_datetime(df["listingDate"]).dt.tz_convert('Australia/Sydney')).dt.days
- Most of the adds has duration of 30 days
df["ad_duration"].hvplot.kde(y='ad_duration')
df["advertised_salary"] = df["salary"].apply(useful_salary_data).apply(lambda x: x!= "")
- Although 152 advertised salary only 129 the parser could extract data to analyse
df["advertised_salary"].value_counts().hvplot.bar()
df["ad_age"].hvplot.kde(title="Age of job ad as of today")
# query data to analyse company ratings
jobs_with_rating = df[df["rating"].notna()]
def top10_rating(df):
return pd.pivot_table(data=df, index=["advertiser"],
values=["rating"], aggfunc=np.mean).sort_values(by="rating", ascending=False)[:10]
def bottom10_rating(df):
return pd.pivot_table(data=df, index=["advertiser"],
values=["rating"], aggfunc=np.mean).sort_values(by="rating")[:10]
top_10_all_jobs_company = top10_rating(jobs_with_rating)
bottom_10_all_jobs_company = bottom10_rating(jobs_with_rating)
top_10_all_jobs_company.hvplot.bar(rot=45)
bottom_10_all_jobs_company.hvplot.bar(rot=45)
data_jobs_with_rating = jobs_with_rating[jobs_with_rating["data_job"]==True]
top_10_data_jobs_company = top10_rating(data_jobs_with_rating)
bottom_10_data_jobs_company = bottom10_rating(data_jobs_with_rating)
top_10_data_jobs_company
| rating | |
|---|---|
| advertiser | |
| BAE Systems | 3.8 |
| KBR - Kellogg Brown & Root Pty Ltd | 3.7 |
| Elders Limited | 3.6 |
| Optus | 3.5 |
| City of Adelaide | 3.3 |
| APA Group | 3.2 |
| Wipro Limited | 3.2 |
| S.A. Health | 3.1 |
| Bendigo and Adelaide Bank | 3.0 |
| GFG Alliance | 3.0 |
bottom_10_data_jobs_company
| rating | |
|---|---|
| advertiser | |
| Detmold Group | 2.4 |
| Capgemini Australia Pty Ltd | 2.8 |
| DWS Limited | 2.8 |
| Bendigo and Adelaide Bank | 3.0 |
| GFG Alliance | 3.0 |
| S.A. Health | 3.1 |
| APA Group | 3.2 |
| Wipro Limited | 3.2 |
| City of Adelaide | 3.3 |
| Optus | 3.5 |
# prepare data to plot heat map salary and rating company
df_with_rating_and_salary = df[df["average_salary"]> 0 & df["rating"].notna()]
salary_rating = pd.pivot_table(df_with_rating_and_salary,
index=["advertiser"],
columns=["rating"],
values=["average_salary"],
aggfunc=np.mean
)
# remove index nae from pivot table
salary_rating.index.name = None
# flatten multiindex
salary_rating.columns = salary_rating.columns.get_level_values(1)
- Not many companies had ratings and also advertised salary enough to spot any correlation of good salary and good rating.
salary_rating.hvplot.heatmap(title="List of companies with rating and salary data",
height=500, width=800,colorbar=False)
# plotting transposing
salary_rating.T.hvplot.heatmap(title="List of companies with rating and salary data",
height=500, width=800,colorbar=False, rot=90)
- Only 47 jobs are realy data jobs, which is about ~7% of the total of the seek search
- Data jobs are on average better paid that "other" jobs
- Out of the Data Jobs only 5 Data Scientist job and 5 Data Engineer jobs all rest were somewhat related to data but not the "rare skill" category.
- Data Scientist is really well paid.
- No Data Engineer ad had salary to compare
- Most of the ads has duration of 30 days
- Although 152 advertised salary only 129 the parser could extract data to analyse
- Not many companies had ratings and also advertised salary enough to spot any correlation of good salary and good rating.
- There are not as many Data Jobs out there as I thought it was by making a search using seek.